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In this report the combination of an iterative technique, the conjugate gradient algorithm, with a fast direct 
method, cyclic reduction, is used to solve the linear algebraic equations resulting from discretization of a 
nonseparabie elliptic partial differential equation. An expository discussion of the conjugate gradient and 
preconditioned conjugate gradient algorithms and of their use in the solution of partial differential equations is 
presented. New results extending the use of the preconditioned conjugate gradients technique to singular linear 
equations which arise from discretized elliptic equations with Neumann boundary conditions are also given. The 
algorithms are applied to solve a specific elliptic equation which arises in the study of buoyant convection pro- 
duced by a room fire. A code was developed to implement the algorithms for this application. Numerical results 
obtained through testing and use of the code are discussed. 
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The numerical solution of the linear algebraic equations which result from a discretization of an elliptic 
partial differential equation has been, and continues to be, the focus of much research in numerical 
analysis. Over the past 25 years there have been many advances, some toward improved iterative schemes 
(ADI, SOR, etc.), others toward fast direct methods (most notably those based on FFT's). See Rice [1] ' for a 
brief survey of the impact of these advances. In this paper we discuss the combination of an iterative tech- 
nique, the conjugate gradient algorithm, with a fast direct method, cyclic reduction, to extend the capabil- 
ities of the fast solver. For examples of the use of similar combinations of algorithms, see Concus & Golub 
[2], Concus, Golub and O'Leary [3], O'Leary [4] and O'Leary and Widlund [5]. 

The work described in this paper resulted from a study of buoyant convection carried out at the National 
Bureau of Standards, [6, 7, 8]. The specific elliptic partial differential equation for pressure arising in this 
work is used as a model problem in the present paper. The experience of the first author with the buoyant 
convection model motivated the writing of sections 1 and 2, which gives an exposition of the conjugate gra- 
dient and preconditioned conjugate gradient algorithms. These sections contain no new material; they were 
written so that this paper may be accessible to an audience unfamiliar with the development of the conju- 
gate gradient algorithm and its use in the solution of partial differential equations. Section 3 has a discus- 
sion of the model problem, the pressure equation. Section 4 contains several new results extending the use 
of preconditioned conjugate gradient technology to the singular linear equations which result from 
Neumann boundary value problems. We conclude with some numerical examples from the buoyant convec- 
tion problem. A listing of a FORTRAN program implementing the algorithm discussed here is presented in 
[15]. 
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1 . An introduction to the conjugate gradient algorithm 

The discretization of an elliptic differential operator by standard finite difference or finite element tech- 
niques produces, as the finite analog of the continuous operator, very large matrices with most elements 
zero. From an algebraic standpoint we can very often view the numerical solution of the differential equa- 
tion as: 

Solve Ax — h (1) 

where A is a k by k, (k large) real symmetric, positive definite and sparse matrix. 

Fast direct methods with minimal storage requirements exist when A is the finite difference operator 
resulting from a separable elliptic operator on a rectangular region, and for some other common, but spe- 
cific, cases. If the operator is nonseparable, or the region non-rectangular, the special direct methods may 
not apply. Moreover, the use of general direct methods such as factoring^ into a Cholesky decomposition, 

A = LL T , 

often impose unbearable storage requirements because many zero entries are filled in during the 
decomposition. 

Iterative methods, such as SOR and Gauss-Seidel, solve a transformation of the problem. Instead of solv- 
ing (1) directly, they "split" A as A — M-N, and solve the equivalent problem through an iteration of the 
form Mx k+l = Nx* + b\ i.e., find a fixed point of the equation 

Mx = Nx + b. (2) 

The efficiency of the solution method depends in part on the appropriateness of the splitting and in part on 
acceleration of the iteration toward the fixed point 

The conjugate gradient algorithm can be motivated as the solution of another transformation of problem 
(1). Consider the inner product <jc,y> = x T Ay. This induces a norm ||x||. 4 = \J x T Ax on R k when A is positive 
definite. 
Problem (1) is equivalent to: find* to minimize 

E(x) = (x-x*) T A(x-x*) = \\x-x*\\ \ (3) 

where x* is the solution of (1). The problems are equivalent because E(x) ^ for all x, and E(x) = <=> x 
= x*. The immediate usefulness of the transformation is not obvious, especially since computing E(x) 
appears to require the solution, x*, of the problem we wish to solve. Note however, that we can evaluate 
whether a given x is or is not a solution by computing the residual r — b — Ax. Further, while we cannot 
evaluate E(x) directly, we can evaluate its gradient vector: 

VE(x) = 2(Ax-Ax*) = 2(Ax - b) = -2r. 

Hence, r gives us the direction in which E decreases most rapidly (unless r is identically zero, which charac- 
terizes the solution). 

These two characteristics of problem (3) lead directly to a simple algorithm for solving (3): 

STEEPEST DESCENT: Given a guess x h not a solution, 

set r, = b — Axi 
setx* J+1 = Xi + aji 

where a, is a scalar chosen to make x i+l minimize E (x i+i ) for all vectors of the form x { + pr c . 

We can easily compute the gradient r,. It is also easy to compute a, so that E (x J+1 ) = E (x ( + aj) < E (x t + 

/3r t ) for all/3. The mimimum of E{x { +fir) is given by the solution of 
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J-2?(*,+/J,) = 



But 

J -£(*, + /}r,) = 2(/?r,Mr, - r/'V,), 



c# 



so 



r,M r, 



(4) 



is the optimal choice for a,. Hence, we can carry out the iteration to minimize E (x) without computing E. 
Note also that^ enters the iteration only in forming the products^*, and^r,. 

Unfortunately steepest descent is not a practical algorithm because the convergence can be very slow. We 
can obtain more rapid convergence by using not the steepest descent directions {r,}, but instead a sequence 
of downhill or descent directions {/?,} which satisfy additional properties. We characterize the term "de- 
scent direction" and simultaneously find the optimal distance to move in that direction, through the follow- 
ing simple result 



LEMMA: //p r r^0, then 



p'r 
P = — 7-r — minimizes E (x 4- ftp) for all ft. (5) 



PROOF: Simply differentiated^ +fip) with respect to/?: 

-^E(x+Pp) = 2(p T A (x-x*)+pp T Ap) 

= 2(-p T r + f}p T Ap). 

It is a simple computation to show that E (x + pp) < E (x). Note that ft is positive \ip T r > 0, which charac- 
terizes/? as a descent, not an ascent, direction. 

The first observation which leads to improved convergence is that the choice in (5) for /J, not only solves 
the one-dimensional minimization, it gives us the p — component of x* exactly. To see this, note that A posi- 
tive definite implies that we can uniquely write 

x + ftp = ap + 6w 

(6) 

x*= a*p+6*z 

where w T Ap = = z T Ap. This last condition is read: w, z are ^-orthogonal or A-conjugate to p. Now substi- 
tute the decompositions (6) into E: 

E(x + ftp) = (a - a*) 2 p T Ap + terms not involving a, a* or/? (7) 

The choice (5) for/? minimizes the left side of (7), and clearly a — a* minimizes the right hand side of (7). 
Hence choosing/3 = P r implies that 
p T Ap 

(x+pp)-x* = 6w-6*z. (8) 
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The error vector is A-conjugate to/?, i.e., lies in the k — 1 dimensional subspace of vectors t such that t T Ap — 
0. If all succeeding descent directions are chosen from this subspace, we can preserve the exact solution of 
the problem in the direction/?. 

The second observation is that we can easily choose successive directions/?!, p^ /? 3 , . . . such that the {/?,} 
are all pairwise A-conjugate descent directions (hence are "A-conjugate gradients") and hence, such that 
the successive errors x x — #*, x 2 — x*, x 3 — x* are constrained to successively smaller dimensional sub- 
spaces. The set {/?,}, i — 1, . . ,k gives a basis for R k and x k — x* must be ^-conjugate to all of the {/?,}, hence 
must be zero. Thus, the process must give the exact answer after at most k iterations. We now write out the 
conjugate gradient iteration directly: 

CONJUGATE GRADIENTS: Given x u compute r x — b — Ax x . Set the initial descent direction/?! to be r x . 
For i = 2, 3, 4, . . . ,do. 

Move to the minimum in direction/?, and evaluate the new residual 

Pi T Ap t 

X i + i — Xi + OtiPi * ' 

r i+x = b -Ax i+l 
— r, — (XiApi 

Compute a new descent direction A-orthogonal to all of its predecessors 

p. = - r ^Ei 

F P (10) 

p«+i = r i+1 +piPt 

The initial step (9) in the iteration is the step in the steepest descent direction/?,, as discussed above. Step 
(10), the choice of a new descent direction, is simply a single step of Gram-Schmidt orthogonalization, to 
remove from the steepest descent direction its component in the direction^/?,. 



Hence 



Futher the choice of a, implies directly that 



p i+1 T A Pi = (11) 



r i+l T Pi = (12) 

The following identities are derived easily from (9) and (10): 

r l+l T r t = 0; 

r? Pi = ri T rr, (13) 

n T Api - Pi T Api. 

The most important algebraic consequences of (9) and (10) yield only to a lengthy inductive argument (not 
given here — see Reid [12] or Hestenes and Stiefel [13] for example) 

r i+x T r,- = 

} (or allj<i (14) 

p i+ i T Apj = 
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The identities in (14) imply that all of the{p,} are ^-conjugate, even though we perform an explicit orthogon- 
alization only to one descent direction in (10). 

In the conjugate gradient iteration observe that A enters only in forming a matrix-vector product, Ap { \ 
there are no transformations done which could destroy the sparseness of A. The dominant cost per step is 
usually that of forming the product Ap { \ in this case the conjugate gradient iteration is very little more 
expensive per iteration than the steepest descent algorithm and it offers the guarantee of convergence 
within k steps. 

The practical reason for using the conjugate gradient algorithm is that we often obtain satisfactory accu- 
racy after only a very few iterations. Certain theoretical bounds for the convergence of the algorithm depend 
on the condition number, x, of A, defined by: 



k max {A) 

K = X~(^T (15) 



where \ max (A) is the largest eigenvalue of A and X min (A) is the smallest eigenvalue of A (Note: A is positive 
definite, so k is positive, and at least as great as one). The following bounds can be found in the literature 
(see Daniel [14], for example): 

lk-*l a <4 




£(*,)< 4 



Clearly, convergence is rapid when x is close to one (and takes place in one step if k equals one). For well- 
conditioned problems very few iterations will suffice to obtain highly accurate solutions. 

2. Preconditioned Conjugate Gradients and Matrix-Splittings. 

Our basic problem is to solve 

Ax = b, (16) 

which is the large, sparse linear system which arises from the discretization of an elliptic partial differential 
equation. We can assume that the matrix^ is symmetric and positive definite, so the method of conjugate 
gradients certainly can be used. However, the condition number, x (A), is usually very large for these prob- 
lems; the conjugate gradient algorithm converges very slowly when applied directly to A and is not competi- 
tive with other methods. 

The preconditioned conjugate gradients method arises, like SOR, from matrix splittings. Consider 

A = M-N 

where M is another positive-definite, symmetric matrix, but one for whjch we can easily solve linear systems. 
There exists a symmetric positive-definite matrix M~ ul such that 

M l = (M- l/2 )(M~ l/2 ) 



[(M- l/2 )(M' l/2 )Y l = M 
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(This is the symmetric matrix whose eigenvectors are the same as ATs and whose corresponding eigenvalues 
are the reciprocals of the square roots of the eigenvalues of M. We shall not use this form, however; we need 
only the formal existence of M~ in ) We can solve the equations Ax = b by computing 

d = M~ 1/2 b; (17) 

and solving 

Cz = d (18) 

where C — M~ l/2 AM~ U2 , by the conjugate gradient algorithm (Note that C is a positive-definite, symmetric 
matrix), finally computing 

x = M~ l/2 z (19) 

This transformation horn Ax — b to Cy — d will be an effective method if: 

a) the condition number, x (Q, is close to 1 so that the conjugate gradient algorithm converges rapidly, 
and 

b) the multiplication Cy for any vector y can be computed efficiently and sparsely. 
We shall examine condition (a) first. Note that 

A = M-N 

implies 

C= M- l/2 AM~ l/2 = I-(M- 1/2 NM~ l/2 ) = I-R 

Hence 

W/-/J) 

This condition number, x (Q, will be close to 1 if X max (R) is small, which is true if all of the elements of R are 
small. (The condition number is 1 exactly when R is 0, or when M = A). We want, then, to choose M so that 
M represents A as well as possible (makes R as small as possible), and such that the choice for M allows for 
condition (b). We discuss a specific choice for M in section 3, the discussion of the model problem. Other 
examples are given in Concus, Golub, and O'Leary [3], O'Leary [4], and Meijerink and van der Vorst[9]. 

We now turn to condition (b); with a formal derivation of the actual algorithm used as "Preconditioned 
conjugate gradients," we show that condition (b) is met whenever M is such that the linear equation 

Mz = w 

can be solved efficiently and sparsely. Reconsider the conjugate gradient iteration to solve Cz = d, given an 
initial guess z t : 

(CG C ): set?! — d— Cz x \ 

seipi = r t ; 

for i = 2, 3, 4 . . . 

a t = - — ! 



prep.. 
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Z i + X — Z t + OtiPi 

fV+i = 7, — CdCpi 

r i+ i T r i+l 



ft 



Ti Ti 



p l + l = r i+l +pip, 

In the above, the residual vectors {?<} and descent directions {/>,-} have bars on them to denote that they 
pertain to the scaled problem Cz = d. Now view (CG C ) from the original space in which we solve Ax = b. 
Define 



r t = b — Ax; 
Then 

r t = d-Cz t = M~ l/2 r, 
Similarily, we can define formally 

p, =M~ U2 p, 
Then 

7/r7 _ ^(M^ytM^V, _ r,' (M" 1 r,) 

Updating z, corresponds to taking 

Xi+i = x, + a, M" 1/2 jD, = x, + a ,-/?,- 
The corrected residual is 

r, +1 = M 1/2 r 1+1 

= HP^ift -a,Cp t ) 
= M ul r i -a l AM- x,2 p l 
— r L — a, ^ p t . 
The Gram-Schmidt coefficient /}, can be computed as: 

The new direction becomes: 
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= M- l/2 (M- l/2 r i+l )+p lPi 

= M- l r i+1 +(] iPi . 

The result of these manipulations is that we can now write a conjugate-gradient like iteration to solve the 
equations Ax = b, but whose convergence rate depends on 

C = M- l/2 AM~ l/ \ not A. 

PRECONDITIONED CONJUGATE- GRADIENTS (PCG): Given initial guess Xl : compute 

v, = Ax x 

r x — b — Ax x — b — v x 

solve Mu x — r x 

and set p x — u { 

For i = 2, 3, 4, . . . 

compute Vi — Ap { \ 

r?Ui r t T M~ x n 



set a, = 



Pi T Vi Pi T Api 



update x i+l = x< + a t pi 

r (+ i = r, - affVf ( = r < — ««■ ^P«) 

solve Mu 1+ i = r J+1 

r i+l T u i+l r i+l T M~ l r i+l 

set /J,. = — ~ ( = 



finally, p i+l = u i+l + /?,/?, 

We introduced new vectors {ii;} and {i;,} in the algorithm above to emphasize the fact that the matrices A 
and M appear only one time during each iteration, and in very specific ways. The matrix^ appears only in 
the formation of the product Ap { (or^x,), an operation which can be done very efficiently for most represen- 
tations of sparse matrices. The matrix M appears only implicitly; we must be able to solve the linear systems 
Mu = r efficiently and in little storage, and this is the primary restriction on M. Note that the matrix 
square-roots, M 1/2 , do not appear at all. The purpose of the matrix M is to scale or pre-condition the problem 
so that convergence takes place very quickly. The cost, of course, is that each iteration now requires the solu- 
tion of a linear equation as well as the formation of a matrix-vector product. 

3. A model problem — the pressure equation. 

The problem to be discussed here arises in a model of buoyant convection (Rehm and Baum [6]). Specifi- 
cally, at each time step in the solution of a mixed hyperbolic-elliptic system, we are required to solve: 



v (^fe) vp M =/( ^ 



(20) 



on a rectangle R subject to the following condition on the exterior normal derivative 

dP , s 
—= — (x,y) = Q(x,y) - g(x,y) on the boundary dR. 
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Here q denotes gas density and P the unknown pressure. We assume that q depends on both spatial vari- 
ables. Equation (1) then describes a nonseparable elliptic equation. Since we must solve this equation 
repeatedly, speed is paramount. Were this a separable equation, the discrete linear system discussed below 
could be solved directly by the cyclic reduction routines of Sweet and Schwarztrauber [10]. Were the density 
and its first and second derivatives known analytically, the problem could be converted to an ordinary 
Poisson equation by the techniques of Concus and Golub [2]. Neither of these conditions can be met, which 
forces us to consider the discrete linear system in more detail. 

In the context of the buoyant convection problem, we are given the density q, and g and/ only on a dis- 
crete set of points; the solution P will be produced on the same grid of discrete points. A grid suitable for the 
hydrodynamic calculations is: 



— • 

(l,n) 


• 


• 


• • • 

• 

• ••• • 


1 • 1 

(m,n) 


• 
• 
• 


• 
• 
• 


(1,3) 


n • 

(12) 


• 
(2,2) 


o • 

(U) 

1 • 1 


• 

(2,1) 

• 1 


• 

(3,1) 

• 1 


• o 
(m,l) 

• 1 



Ax/2 



Ax 



The functions q and/are given on the interior points; g is given at the circles on the boundary. Note that the 
interior grid is offset by one half grid spacing from the boundary. 

The derivatives in (20) are replaced by second-order accurate centered finite differences. For example, 



dx 



1 



Q(x,y) d x 



d * 

P(x,y) 



becomes 



Ax 



1 



Ax x 
q(x + — 2~, y) 



[~h^ x 



+ Ax,y)-P(x,y)) 



•]- 



1 



q(x ~ -j—! r) 



[■=- (?< *< 



y) - P (x - Ax, y) 



,)]) 



Note that we require the reciprocal of the density at an intermediate point an O(Ax) approximation will suf- 
fice to give second order accuracy for the operator. For consistency with the hydrodynamic equations in the 
buoyant convection model, we use the reciprocal of the average density: 
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1 2 

q{x + _^_, r ) " W* + A *> r) + ofay)] 

Define q^ — q ((t — 1/2) As, (/— 1/2) Ay), and similarly for P^fij, and g l7 . Then the general equation for an 
interior grid point i, j not adjacent to the boundary is 



1 \2 




> 



(21, 7 ) 



= fu 



The adjustment for points adjacent to the boundary uses the second order centered approximation to the 
boundary derivative: on boundary k (left = 1, right = 2, lower = 3, upper = 4), the terms d k are evaluated 
using the first interior mesh point and an image point (one-half grid spacing outside the region). Formally 
the terms d x are evaluated at i — 1/2, j at the left boundary for example. The discretized form of the 
Neuman boundary conditions become 



VAST/ 2 \Y dlPm ' J ^ Y^^) = gm,U2,j/^X = g 2 (J) 

("Ay") 2 v T*^'' 1 + ~j d3Pl '°) = &,v2/Ay = gJLi) 
VAy/ 2 \~Y d4Pi ' n + y^> +i ) = ^> + i/2/Aj = ^ 4 (0 



(220 
(22 2 ) 
(22 3 ) 
(22 4 ) 
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For a point adjacent to boundary k, subtract equation (22*) from equation (2l it j) which eliminates the d k 
terms, and with them, all references to points i, j outside the grid. The right hand side is replaced by 

fu ~ gk 

Corner points are treated by subtracting (22*) and (22,) from (21,,), where the corner is adjacent to both its 
A>th and i-th boundary. 

We arrange the grid points in the order 
{(1,1), (2,1), . . . ,(ra,l), (1,2), (2,2), . . . , (ra,2), . . . , (l,ra), . . . , (m,n)} and write the equation for each point in 
this order. The result is the matrix equation 



i — j — i — i 

0, T, I 2 I 

\--\-v- J x 



\t , ! o 



n-l , 

I °r,l ! 'n J 
I J I 



where 7] is an m x m symmetric tridiagonal matrix, and 0, is an m x m diagonal matrix. We shall denote this 
matrix equation by 



Ax= b 



(23) 



where b represents the right hand side,/, of the pressure equation, adjusted by the boundary data, g. 
This is the linear algebraic system we need to solve. We make several observations: 

1. A is symmetric 

2. A is sparse. Only five diagonals contain non-zero elements. 

3. The eigenvalues of A are all real and non-positive; A is a negative semi-definite matrix, generally of 
rank mn — 1. Hence —A is a positive semi-definite matrix. 

Although we could apply the method of conjugate gradients directly to — A, convergence would be very 
slow. O'Leary [4] and Meijerink and Van der Vorst [9] suggest several different approaches for creating scal- 
ing matrices M to use a preconditioned conjugate gradient scheme. One general approach is to consider 
that A originated from a differential equation and let M be the discrete operator from a separable differen- 
tial equation which approximates the pressure equation. 

A specific choice is suggested by Concus and Golub [2]: Let L be the discretization of the Poisson equation 

V 2 P = / 
with Neumann boundary conditions (on the same staggered grid). Then 



377 



L = 



c,/ , B 2 
| X — 



4 

1 



where 



v — -i — -i 
! % 1 c i l 
!_ — ; — 4 



1 c.I ! B 



L I 4 



B,= 



a, c 2 

c 2 a 2 c 2 



c 2 a 2 c 2 



o 







— 




O 




c 2 






c 2 


a 2 


c 2 




c 2 


a x 



B 2 = 



a 3 c 2 

c 2 a 4 c 2 



O 



a 4 c 2 



O 



a 4 



c 2 a 3 



(24) 



with ax = — 



a 2 



a 3 



a 4 



(A*) 2 


(A r ) 2 


2 


l 


(Ax) 2 


(Ay) 2 


1 


2 


(A*) 2 


(Ay) 2 


2 


2 


(A*) 2 


(Ay) 2 


1 


1 



(Ay) 2 



(Ax) 2 
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We 


assume that 


we have available a good subroutine for solving the equation 








t 










Lu = v, 












e.g., 

[10]. 

L 


subroutine 
etD 1/2 beth 


BLKTRI from the NCAR packaj 
e diagonal matrix with entries 

d kh 


ye of subroutines for 


elliptic 


partia 


1 differential 


eqi 


lations 


= V a kk /\ kk , 




the 


square root c 


)f the ratio of correspondir 


ig main 
M- 


diagonal entries of A 
= D x,2 LD xn 


and L. 


Then 









is a symmetric negative semi-definite sparse matrix whose main diagonal is identical to that of A. Hence, in 
the splitting 

A = M - N, 

the matrix /V has zero main diagonal and non-zero entries on at most four off-diagonals. 
In the preconditioned conjugate gradient iteration we must be able to solve 

Mz= r 

This is done without excessive storage demands by: 

a) compute w = D~ xn r (D is a diagonal matrix) 

b) solve Lu — why cyclic reduction (BLKTRI) 

c) compute z — D~ l/2 u. 

For computational convenience, to avoid the repeated multiplications by the diagonal matrix D xn or its 
inverse, we replace the original problem 

Ax = b 

by Ax = b 

where A = D l/2 AD' l/2 

b = D U2 b 
solve Ax = b by preconditioned conjugate gradients withL as the scaling matrix, and finally compute 

x= D- U2 x 
This is the approach actually used in the computations reported in section 5. 

4. The Preconditioned Conjugate Gradient Algorithm for a 
Semi-definite Matrix. 

In our discussion of the discrete operators A and L in the previous section we have ignored one character- 
istic of these operators which renders them unsuitable for direct use in a conjugate gradient or precondi- 
tioned conjugate gradient iteration. The conjugate gradient algorithm requires that A be a positive-definite 
matrix; otherwise the function E may have zeroes other than at the solution of the linear equation. The 
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algorithm becomes transparently a maximization algorithm for negative-definite matrices. (There is no need 
to change signs implicitly or explicitly to solve linear systems involving negative definite matrices, e.g., the 
discrete Laplacian with Dirichlet boundary conditions). However, our operators A and L are both negative 
semidefinite; they have negative eigenvalues, and also, one zero eigenvalue each. 

In this section we will extend the theory of the conjugate gradient algorithm to allow for the special case 
of a semi-definite matrix with known nullspace. We shall then extend the preconditioned conjugate gradient 
algorithm to allow both A and M to be of this special type. The first part of the section will be elementary for 
readers well-versed in the conjugate gradient theory; the second part contains some new and unexpected 
results. 

The problem is simple: A is a singular matrix. This is shown easily by noting that whenever a term d k 
appears on the diagonal of A, the opposite term — d k appears on one of the off-diagonals. Hence, the sum of 
the coefficients in any row of A is exactly zero. But this is the same as saying 

Ae = 

where e is the vector(l,l,l, . . .l) r . In general, Ax = <=>x = ae, where a is a real scalar. 

The singularity of A is related to the Neumann boundary conditions for the pressure equation. If P is a 
solution to the differential equation so is P + c, where c is any constant. Similarly, if x is any solution to (3), 
so is x + o/e, where a is any scalar. But this is equivalent to adding the constant a to each point x u of the 
solution. The singularity of the operators also implies that not every system of equations has a solution. A 
system of equations is consistent if and only if it has a solution. The characterization of consistency for these 
operators is simple to express: The pressure equation is consistent < = > 

//»/=/.«? (25) 

The linear equations^* = b are consistent <=> 

b T e = 0, that is, when b in Range (A). But 
b T e = <=> I fey = 

<=> If, = X( gl (j) + g 2 (j)) + lfc(0 + gli)) 

the discrete analog of the integral equality (25) 

Further, even when the equation is consistent, the solution is not unique. There is a unique solution of 
shortest length in the usual L 2 norm. For the continuous case it is the solution with mean pressure zero, i.e., 

IS.P = o. 

In the discrete case the analog is 

x T e = 0; 
again, the mean (discrete) pressure is zero. 

We shall now discuss the modifications to the conjugate gradient algorithm which would be necessary to 
obtain this unique solution to a consistent system of equations. For generality, assume that we want to solve 

Ax = b (26) 

where A is symmetric positive semi-definite of dimension k with known nullspace the span of {n}. Hence, the 

rank otA is one less than its dimension. Further, assume b T n = o, so that (26) is a consistent system. 

b T n 
(note: if (26) is not consistent, b = b — — = — satisfies the consistency condition, and any solution of Ax — b 

n n 

is a least squares solution of (26)). 

Let the vectors {n,q^q 3 ,q Ai . . . ,q k } form an orthonormal basis for R k . Then Range (A) = span 

{?*?* , • • ,?*} 
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Let Q be the orthogonal matrix ( Q T = Q *) 

Q = [n ! q 2 i q 3 '. ! gj- 

Then 

J % = ^ <=> (£M (<? r *) = (? r 6 

But, by the consistency condition b T n — 



Q T b 




d 2 



'dk-i 



Similarily 



Q*A Q = 



B 



k-\ 



k-\ 



Hence, (27) really takes the form 











w 

Z\ 
Zi 


_ 



d t 
d 2 









B 
















Z k -i 




'd k ., 



(27) 



(28) 



Clearly w is arbitrary in (28). The equation holds whenever 

Bz = d (29) 

a (k—l) — dimensional problem. All solutions of (26) are given by 

where z solves (29). The unique minimal length solution is given by 

x* = q[°~] 

By the assumptions on A, the matrix B is positive-definite and the conjugate gradient iteration can be 
used to solve (29). The condition number 

X k .,(B) h(A) 



*(B) = 



A,(B) X 2 (A) 
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(where we assume the eigenvalues are ordered algebraically) governs the convergence of the algorithm. 

We now show that we can solve the consistent linear system (26) directly by the conjugate gradient algo- 
rithm, with convergence rate determined by x (B). The proof is elementary: we show that the algorithm, 
when started with an initial vector x x and a right hand side b which lie in the same invariant subspace of A, 
solves the problem while remaining entirely in that subspace. 

Suppose we have the equation 

Ax= b 

with b T n = andx^n — 0. Then the conjugate gradient algorithm, when applied to^, produces vectors x h 
r, and pi exactly equal to 

x t = Q 



n = Q 



and 



ta 



where z h r, and/?, are the vectors produced by the conjugate gradient algorithm for 

Bz= d 
with initial vector z, = the last k— 1 entries of Q T x x . We note that 
Q T r t = Q T (b -Ax,) = Q T b - Q T A(QQ T )x 



BJ-f 




Bz] 







where it is essential that b T n = 0. It follows that 

Q 



-S 



So assume the required properties hold for*,-, r, and/?,. We shall show the iteration preserves these prop- 
erties ioTXt+u r J+1 , and /?,-+!. The first step in the iteration is to compute 



which by induction is 



r, r, 
pfApi ' 



" " 


T 


0" 


r, 


Q T Q 


ft 


"0" 


T 


~0~ 


h 


Q-AQ 

A T A 

TiTi 


- ? 



pfBp, 
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since Q T Q = /and 



Q*AQ= \0 



B 



Hence, a, = a,, and it follows that 



x«+i 



.♦■*- °\3+-«\8-°yd-<\£l 



Then 



r i+l — r, — (XiApi 

That/?, = p, follows from exactly the same reasoning that showed equality for the numerators of a, and a,. It 
is equally easy to show that 

/>,+, = r, +1 + fi iPi <=> Q 



T p~ = [f;;rhfc\ 







Thus the conjugate gradient iteration will succeed in solving a consistent system. 

We should note that the condition that x[n — is required only to assure that the solution also satisfies 
this property. If the minimal length solution is not desired, this condition can be ignored. The condition that 
b T n — 0, that^x = b be consistent, is essential. If the system is not consistent, the initial residual r„ and all 
successive residuals, will have the same component in the direction of the nullspace. Suppose that b (and r, 
and/?,) has a component y -n of the nullspace. The recursion for {r,j 

r, + , = r, — a t Api 

shows that r, +1 has a component y • n of the nullspace for all i. This, of course, also follows from the property 
that r 1+1 is a residual vector for^;e, and b. The directions {/?,}, and the approximate solutions {*,}, have com- 
ponents of n which increase with i(in fact, rapidly). 
It suffices to examine the recursion for/?, 

p l+l - r I+1 + piPi 

where /3, >0. If d, is the component of n in the vector/?,, 

($.-♦. - (y + /U) = (y + p, (y + /?,-, •(*,-.)) = ... 

Thus the direction's component of the nullspace increases. At the same time, /?, +1 is shorter than r, + , in the^ 
norm, so the relative component in the direction of the nullspace grows. In the limit, the directions may 
become nearly parallel (they are still A-orthogonal, but^ does not induce a metric). We can see this also by 
looking at the limiting case when a iteration is started with an actual solution vector. The residual vector 
(which is not a descent direction) and the initial direction, are in the nullspace — the initial step ct x is infinite. 
The last example given above implies that the convergence rate for obtaining the least squares solution to 
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an inconsistent system no longer depends on x (B). The convergence obtained in practice may be much 
slower than the bound obtainable for a consistent system. In practice, we are interested only in minimizing 
the residual, which we can ensure by working with a consistent system. In actual finite precision computing, 
the residuals and directions may wander slightly into the nullspace, in which case we are in the same situa- 
tion as if we started with a slightly inconsistent system. The effects will be negligible unless the rate of con- 
vergence is very slow. In any case, the effects may be suppressed by explictly reorthogonalizing the direc- 
tions/?, to the nullspace. 

In the preconditioned conjugate gradient algorithm we replace the system 

Ax = b 

by 

Cz — d 

where 

C = VAV, 

V some positive definite symmetric matrix. In the special case of a semi-definite A, a natural choice for a 
scaling matrix may also be semi-definite. This is the case for the two operators discussed in section 3. We 
have two possible cases to consider: V positive definite and V semi-definite. 

The case of a positive definite V is little changed from the ordinary semi-definite conjugate gradient 
algorithm. We take M = (F 2 )" 1 , or more directly, V = M~ in . The rank of the matrix C is k— 1 and its 
nullspace is 

span { AT /2 n A } 

where n A denotes a non-zero vector in the nullspace of A. The convergence condition for C required that rj 

and/?, be orthogonal to M U2 n A . 

The former condition will hold if Ax — b is consistent since then 



and 



r, = M~ U2 (b-Ax) 



{b-Ax) T n A = 



The latter condition is equivalent to 

p { 1 Mn A 
since 

Pi = M~ U2 p i 
But clearly/?, = M' 1 r x is orthogonal XoM n A , since r*n A — 0. By induction, iip?Mn A — 0, 
p i+l T Mn A = (r i+l T M~ l )Mn A 4- ft (p ( T M n A ) = r i+l T n A = 0. 

A (C) 
Thus, convergence is governed by the pseudo-condition number , /r . , the residual vectors r, remain those 

of a consistent system, and the coefficients a, and/?, are determined by an iteration remaining in a subspace 
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of the range of C. However, the scaled directions p ( are allowed to venture into the nullspace of A, and 
hence, the solution vector x is no longer necessarily the minimal length solution. This may be repaired easily 
at the end of the iteration by computing 



x T n A 

x = x — 



n A n A 



The other natural case is one in which the scaling matrix M is also positive semi-definite, with the null- 
space generated by a known vector n M . The linear operator corresponding to M" 1 in the definite case will be 
M\ the pseudo-inverse of M. (For our purposes, we will need only a method for solving consistent systems 
Mz = w. Our knowledge of the nullspace of M then enables us to compute the minimal length least squares 
solution, M + u, for any right hand side u). Formally, the matrix V appearing in the definition of C is (M + ) 1/2 , 
so 

C = {M + ) l/2 A(M + ) l/2 • (6) 

Several conditions which must be met by M are immediate. Since nullspace (Q 3 nullspace (A/), the rank of 
M must be at least the rank of A. For our specific problem rank (M) must be at least A;— 1. Otherwise C has 
rank at most k—2 and cannot possibly span the entire k—\ dimensional space of possible solutions to Ax = 
b. To guarantee that C has rank A;— 1, we must have rank (M) ^ k — 1 and also that the nullspace of M is not 
orthogonal to the nullspace of A. If the latter condition fails, we know that n A lies in the range of M (since it 
is orthogonal to the orthogonal complement of the range of M). Hence, by symmetry of M, n A lies in the 
range of (M + ) 1/2 . There exists a vector z such that(M + ) 1/2 z — n A and z 1 n M . We would have: 

Cn M = (M + ) 1/2 A ((M + ) U2 n M ) = (M + ) 1/2 ,40 = 
Cz = (M + Y /2 A((M + Y /2 z)= (M + ) l/2 An A = (M + ) 1/2 = 

Thus, the dimension of the nullspace of C would be at least two. 

Recall now the motivation for the preconditioned algorithm. The rate of convergence of the conjugate 
gradient algorithm is determined by the condition number, x(A), which is large for most discrete elliptic 
operators. To accelerate the coverage of the conjugate gradient algorithm, we replaced A by the precondi- 
tioned operator C = M~ l/2 AM~ l/2 , where we presume that x{C) ~ 1. Our presumption can only be true if 
k(A) ~ x(M), i.e., the preconditioning matrix must have roughly the same poor conditioning as has^. This 
requirement follows from the inequality 

where we note that x(M~ l ) = x(M). A brief proof of this inequality is given below: 

)c(Q = A WflV (C)/A^(C). 

However, the eigenvalues of C are the same as the eigenvalues of M~ l A. We can bound the eigenvalues of 
this latter product as: 

X max (M-'A)>X ma AM-')-X m ,„(A), 

k ml AM- 1 A)<k mln (M- l )k m «.(A). 
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It follows that 

AL) ' A_(A) *A mi ,(M) 

= x(M)/ 4A). 

The other inequality above follows by a similar argument. 

We can obtain suitable preconditioning matrices for most positive definite discrete elliptic operators only 
by using poorly conditioned matrices. The required poor conditioning of M may be even worse in the case 
where both A and M are semi-definite. The preconditioning breaks down if 

n A T n M = 0. 

We shall now show that the preconditioning nearly breaks down, in the sense that M must be very ill-condi- 
tioned, if the nullspaces are almost orthogonal. Assume that n A and n M have length one, and that 

n A T n M — d, 

which is much less than one. We can write A and M + in their respective eigendecompositions 

A = UD A U\ 

M + = VD M+ V\ 

where the zero eigenvalue of^4 and of M + is given first. Then 

C= M+ l/2 AM +l/2 = SS\ 

where S is defined by 

5 = VD M+ U1 V T UD A l/2 . 

For each of the matrices S, M + , A, and X, define the condition number of the matrix as the condition 
number of the matrix restricted to the orthogonal complement of its nullspace. Then 

x(S) = x[(ZW /2 X*)(A<" 2 )]. 

In the above equation, the matrix X is the (A:— 1) by (k—\) matrix obtained by removing the first row and col- 
umn from V T U, that is, 



V T U = 



We now note that the inequality 



x(BC) > max 



k(B) x(C) 



AC) ' x(B) 
holds for unsymmetric matrices B and C. (The proof follows from the triangle inequality for matrix norms, 

PIKIIflqi ||c-||.) 
386 



When we extend this inequality to products of three matrices we obtain the relatively weak result that 

mrm> 1 * g > *(Q *(D) 

WLU)*max j x(C)x(i)) . ^ )x(D) ■ x(5)x(C) 

In the context of our preconditioned operator, only x(X) is unknown. However, Alan Cline [11] has shown 
that when<5«l, 

k(X) = 1 / d. 

Thus, the condition number of the preconditioned operator C is bounded below by: 

k(Q>(\/6) 2 /(x(A)k(M)). 
Since x{A) is fixed, this implies that k(M) must be large whenever 6 is small. 

V. Numerical Results 

A code, written in FORTRAN, was developed to implement the preconditioned conjugate gradients 
scheme discussed in previous sections. Care was taken in the preparation of the code to make it portable and 
to introduce many comments for clarity. The code has been run successfully under a variety of conditions 
and has been compared with analytical results to determine its accuracy. In this section a description is 
given of some of the computations used to determine the performance of this code. 

The model problem, the pressure equation, was discussed in section III. Special cases of this general non- 
separable elliptic equation were used to test the code for accuracy. All production runs of this code were per- 
formed when the code was imbedded in a larger linear or nonlinear fluid dynamics computation; timing 
studies were performed in such an environment. 

Within the code, two tests are used to terminate the conjugate gradient iteration; these depend upon two 
specified parameters, the maximum number of iterations (less than or equal to 50) and a maximum relative 
residual, £. (The relative residual is defined as the norm of the residual divided by the norm of the right 
hand side in the scaled problem discussed in sections 2 and 4.) In all successful computations to date, the 
iteration is terminated after a relatively small number of iterations by the relative residual norm satisfying 
the criterion that it be below £. 

To test the code under the simplest conditions, a Poisson equation on the unit square was discretized and 
solved with homogeneous Neumann conditions applied at the boundary. In continuous form this problem is 

d 2 p d 2 p y, v _ / . ,*\ 
-q^t + -0-j- = / (x,y) = cos (xtt x) cos {in y) 

on < x < 1 and < y < 1 . The boundary conditions are 

- = atx = and re = 1, 
Ox 

dp _ 



- at y = andy = 1. 
dy 



The solution to this problem is 



P (*>7) = ~ H35 + ( XTT ) 2 cos ( X7T *) cos (in y) 

When this problem is discretized to second order accuracy on the "staggered grid" (a grid displaced one 
half incremental unit 6x in the ^-direction and one half incremental unit dy in the y-direction) as discussed 
in section III, it can be written 
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— (p.+i* - 2p tJ + p t . XyJ ) + -r-r (p« v+ i ~ 2/>y + Av-i) = /y 



Where 

/ v = cos [™ (j - 1/2)] cos [^ (j - 1/2)] 

for 

1 < i < m and 1 < / < n. 

Here 6a: is the mesh spacing in the x-direction, 6x = 1/ra (m is the number of mesh cells in the x-direction) 
and 6y is the mesh spacing in the y-direction, 6y = \/n (n is the number of cells in the y-direction). The 
boundary conditions are 

Poj =Pu andj5 w+lv , =p mJ for l</<n 

Pi,o = Pi,i and/5 I>+1 = />,> for Ki<m 

The solution to this linear algebraic system is 

(2m sin — Y + (2n sin — ] 

V 2m) I In) 



Pij = /tt r> cos [~ (*"**)] cos [-^ (/-I/2)] 



forO<t<mH-l,0<y<ra+l 

The solution to this simple discretized Poisson equation was computed using the code, and the solution 
compared with the analytical solution given above. 

A small test problem, m = n — 7, was used to determine the accuracy obtainable when e = 10" 6 Com- 
parison with the exact solution demonstrated that the components of the solution vector obtained from the 
computation agreed to at least six significant figures with the exact solution. Generally the agreement was 
much better, being seven or eight significant figures for most values of i and/ (Note that the UNIVAC 1 108, 
on which all computations were run, carries about eight significant figures.) 

As a larger test problem, the equations with m — n — 31 were solved withe = 10" 6 For this computation 
agreement was obtained to a few parts in the sixth significant figure. 

A second test problem, a discretized approximation to a separable elliptic equation, was also solved ana- 
lytically and using the code. Comparison of these results indicated that the accuracy was similar to that 
obtained in the first test problem. 

The code was then imbedded in a linear finite difference computation obtained from a second-order dis- 
cretization of a set of equations arising in fluid dynamics. These equations describe two-dimensional inter- 
nal gravity waves in a stratified ambient fluid within a rectangular enclosure. The interest in this problem is 
discussed, the continuous and discrete equations are presented and exact analytical solution to the con- 
tinuous and discrete problem are given by Baum and Rehm [8]. Additional linear computations using this 
code on a somewhat more general fluid-flow problem are described in Rehm and Baum [7]. In this latter 
paper the more general linear finite difference equations are given, and the computational procedure for 
solving them is presented. The manner in which the preconditioned conjugate gradients code is used in solv- 
ing the elliptic pressure equation is discussed. In all of these linear computations, the equation for the 
pressure is separable; it is only in the general, nonlinear computations that the equation for the pressure is 
nonseparable. 

For the linear computations reported in these papers, the number of iterations required to obtain a rela- 
tive residual error less than £ = 10" 5 or 10~ 6 generally varied between 2 and 4. A large number of such com- 
putations have been run. 

A representative one is a calculation for which m — 15, n = 16 and e = 10" 6 ; in this computation the 
pressure-solver was called 200 times. Two iterations to convergence were taken ten of the first eleven times it 
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was called, each call using about 0.5 s of CPU time on the NBS Univac 1108. Thereafter, each subsequent 
call required only one iteration to convergence taking about 0.35 s of CPU. An indication of the rate of 
covergence of the algorithm is obtained by taking 

_ log 10 (Initial relative residual) — log 10 (final relative residual) 
number of iterations 

For the computation reported above, this rate of convergence was about 4.5 when two iterations were taken 
and about 5.5 when one iteration was taken. 

It should be noted that these computations determine a flow field which evolves with time. Except for the 
first time step, the pressure vector at the previous time step is used as the initial guess for the pressure vec- 
tor each time the elliptic-solver is called. Hence the guess at each calling is quite good and convergence is 
rapid. 

Computations of a set of nonlinear finite difference equations generalizing the linear ones described 
above have also been run. As noted previously, in this case the elliptic equation for the pressure is nonsepa- 
rable. In a limited number of computations, the PCG code has been found to perform well in this case also. 
A representative computation, one for which / = 31, / = 31 and £ = 10" 5 , was found to take between 2 and 
5 iterations for convergence at each call. The CPU time taken was slightly over 2 s for 2 iterations and slight- 
ly under 5 s for 5 iterations (very roughly a second per iteration generally). In this calculation d defined 
above was found to vary between about one and two. 

In the Appendix to [15] a listing of the elliptic-solver, called FASTSL, is given. To use this code, subrou- 
tines from EISPACK, the Argonne Code Center package, and BLKTRI, a subroutine in the NCAR package 
developed by Schwarztrauber and Sweet [10] are required. 
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